Use mutate to add new variables or modify the existing ones.

Add new variables

For example, the pulse dataset has two pulse measurements, let’s say we are interested in average pulse and we want this information to be available as a separate variable, e.g. averagePulse, in the pulse tibble. Then we can do this with:

mutate(pulse, averagePulse = (pulse1+pulse2)/2)
# A tibble: 110 × 14
   id     name  height weight   age gender smokes alcohol exerc…¹ ran   pulse1 pulse2
   <chr>  <chr>  <dbl>  <dbl> <dbl> <chr>  <chr>  <chr>   <chr>   <chr>  <dbl>  <dbl>
 1 1993_A Bonn…    173     57    18 female no     yes     modera… sat       86     88
 2 1993_B Mela…    179     58    19 female no     yes     modera… ran       82    150
 3 1993_C Cons…    167     62    18 female no     yes     high    ran       96    176
 4 1993_D Trav…    195     84    18 male   no     yes     high    sat       71     73
 5 1993_E Lauri    173     64    18 female no     yes     low     sat       90     88
 6 1993_F Geor…    184     74    22 male   no     yes     low     ran       78    141
 7 1993_G Cher…    162     57    20 female no     yes     modera… sat       68     72
 8 1993_H Fran…    169     55    18 female no     yes     modera… sat       71     77
 9 1993_I Sonja    164     56    19 female no     yes     high    sat       68     68
10 1993_J Troy     168     60    23 male   no     yes     modera… ran       88    150
# … with 100 more rows, 2 more variables: year <dbl>, averagePulse <dbl>, and
#   abbreviated variable name ¹​exercise

By default the new column is added at the last position in the tibble.

Does the pulse tibble now contain the variable averagePulse ?

No, if you want to keep the new variable averagePulse you’ll need to use assignment ‘<-’ to replace the original pulse tibble with the newly modified version:

pulse <- mutate(pulse, averagePulse = (pulse1+pulse2)/2)


Take as another example the variable BMI: \[BMI=\frac{weight_{kg}}{{height_m}^2}\]

Note that BMI definition states that weight and height must be in kilograms and metres respectively. In the pulse dataset weight is given in kilograms but height is in centimetres. We can now first create a new variable height_metre containing the height in metres and then calculate BMI:

pulse_bmi <- mutate(pulse, height_metre=height/100) # convert centimetres to metre
pulse_bmi
# A tibble: 110 × 14
   id     name  height weight   age gender smokes alcohol exerc…¹ ran   pulse1 pulse2
   <chr>  <chr>  <dbl>  <dbl> <dbl> <chr>  <chr>  <chr>   <chr>   <chr>  <dbl>  <dbl>
 1 1993_A Bonn…    173     57    18 female no     yes     modera… sat       86     88
 2 1993_B Mela…    179     58    19 female no     yes     modera… ran       82    150
 3 1993_C Cons…    167     62    18 female no     yes     high    ran       96    176
 4 1993_D Trav…    195     84    18 male   no     yes     high    sat       71     73
 5 1993_E Lauri    173     64    18 female no     yes     low     sat       90     88
 6 1993_F Geor…    184     74    22 male   no     yes     low     ran       78    141
 7 1993_G Cher…    162     57    20 female no     yes     modera… sat       68     72
 8 1993_H Fran…    169     55    18 female no     yes     modera… sat       71     77
 9 1993_I Sonja    164     56    19 female no     yes     high    sat       68     68
10 1993_J Troy     168     60    23 male   no     yes     modera… ran       88    150
# … with 100 more rows, 2 more variables: year <dbl>, height_metre <dbl>, and
#   abbreviated variable name ¹​exercise

pulse_bmi tibble has now the height in metre units, now we can calculate BMI:

pulse_bmi <- mutate(pulse_bmi, BMI=weight/(height_metre^2)) 
pulse_bmi
# A tibble: 110 × 15
   id     name  height weight   age gender smokes alcohol exerc…¹ ran   pulse1 pulse2
   <chr>  <chr>  <dbl>  <dbl> <dbl> <chr>  <chr>  <chr>   <chr>   <chr>  <dbl>  <dbl>
 1 1993_A Bonn…    173     57    18 female no     yes     modera… sat       86     88
 2 1993_B Mela…    179     58    19 female no     yes     modera… ran       82    150
 3 1993_C Cons…    167     62    18 female no     yes     high    ran       96    176
 4 1993_D Trav…    195     84    18 male   no     yes     high    sat       71     73
 5 1993_E Lauri    173     64    18 female no     yes     low     sat       90     88
 6 1993_F Geor…    184     74    22 male   no     yes     low     ran       78    141
 7 1993_G Cher…    162     57    20 female no     yes     modera… sat       68     72
 8 1993_H Fran…    169     55    18 female no     yes     modera… sat       71     77
 9 1993_I Sonja    164     56    19 female no     yes     high    sat       68     68
10 1993_J Troy     168     60    23 male   no     yes     modera… ran       88    150
# … with 100 more rows, 3 more variables: year <dbl>, height_metre <dbl>, BMI <dbl>,
#   and abbreviated variable name ¹​exercise

Alternatively, you may skip the creation of height_metre and calculate BMI directly from the pulse tibble:

pulse_bmi <- mutate(pulse, BMI=weight/((height/100)^2)) 
pulse_bmi
# A tibble: 110 × 14
   id     name  height weight   age gender smokes alcohol exerc…¹ ran   pulse1 pulse2
   <chr>  <chr>  <dbl>  <dbl> <dbl> <chr>  <chr>  <chr>   <chr>   <chr>  <dbl>  <dbl>
 1 1993_A Bonn…    173     57    18 female no     yes     modera… sat       86     88
 2 1993_B Mela…    179     58    19 female no     yes     modera… ran       82    150
 3 1993_C Cons…    167     62    18 female no     yes     high    ran       96    176
 4 1993_D Trav…    195     84    18 male   no     yes     high    sat       71     73
 5 1993_E Lauri    173     64    18 female no     yes     low     sat       90     88
 6 1993_F Geor…    184     74    22 male   no     yes     low     ran       78    141
 7 1993_G Cher…    162     57    20 female no     yes     modera… sat       68     72
 8 1993_H Fran…    169     55    18 female no     yes     modera… sat       71     77
 9 1993_I Sonja    164     56    19 female no     yes     high    sat       68     68
10 1993_J Troy     168     60    23 male   no     yes     modera… ran       88    150
# … with 100 more rows, 2 more variables: year <dbl>, BMI <dbl>, and abbreviated
#   variable name ¹​exercise

Update variables

In the examples above we added a new variable to our dataset, but you can also update an existing variable. For example, let’s say we want to have the age expressed (roughly) in days instead of years:

mutate(pulse, age=age*365)
# A tibble: 110 × 13
   id     name  height weight   age gender smokes alcohol exerc…¹ ran   pulse1 pulse2
   <chr>  <chr>  <dbl>  <dbl> <dbl> <chr>  <chr>  <chr>   <chr>   <chr>  <dbl>  <dbl>
 1 1993_A Bonn…    173     57  6570 female no     yes     modera… sat       86     88
 2 1993_B Mela…    179     58  6935 female no     yes     modera… ran       82    150
 3 1993_C Cons…    167     62  6570 female no     yes     high    ran       96    176
 4 1993_D Trav…    195     84  6570 male   no     yes     high    sat       71     73
 5 1993_E Lauri    173     64  6570 female no     yes     low     sat       90     88
 6 1993_F Geor…    184     74  8030 male   no     yes     low     ran       78    141
 7 1993_G Cher…    162     57  7300 female no     yes     modera… sat       68     72
 8 1993_H Fran…    169     55  6570 female no     yes     modera… sat       71     77
 9 1993_I Sonja    164     56  6935 female no     yes     high    sat       68     68
10 1993_J Troy     168     60  8395 male   no     yes     modera… ran       88    150
# … with 100 more rows, 1 more variable: year <dbl>, and abbreviated variable name
#   ¹​exercise

here we keep the variable age but change its unit from year to days.

Another example would be to convert the height and weight from metric to imperial units with (1 kg = 2.2 lbs) and (1 inch = 2.54 cm) :

mutate(pulse, height=height/2.54, weight=weight*2.2)
# A tibble: 110 × 13
   id     name  height weight   age gender smokes alcohol exerc…¹ ran   pulse1 pulse2
   <chr>  <chr>  <dbl>  <dbl> <dbl> <chr>  <chr>  <chr>   <chr>   <chr>  <dbl>  <dbl>
 1 1993_A Bonn…   68.1   125.    18 female no     yes     modera… sat       86     88
 2 1993_B Mela…   70.5   128.    19 female no     yes     modera… ran       82    150
 3 1993_C Cons…   65.7   136.    18 female no     yes     high    ran       96    176
 4 1993_D Trav…   76.8   185.    18 male   no     yes     high    sat       71     73
 5 1993_E Lauri   68.1   141.    18 female no     yes     low     sat       90     88
 6 1993_F Geor…   72.4   163.    22 male   no     yes     low     ran       78    141
 7 1993_G Cher…   63.8   125.    20 female no     yes     modera… sat       68     72
 8 1993_H Fran…   66.5   121     18 female no     yes     modera… sat       71     77
 9 1993_I Sonja   64.6   123.    19 female no     yes     high    sat       68     68
10 1993_J Troy    66.1   132     23 male   no     yes     modera… ran       88    150
# … with 100 more rows, 1 more variable: year <dbl>, and abbreviated variable name
#   ¹​exercise

if_else(condition, true, false, …)

In the previous examples we were updating or adding variables with simple arithmetic using mutate and all values were considered under the same calculation. However, there are situation where we would like to treat values conditionally, this is possible with the helper function if_else.

Examples:

Add a new variable max_pulse reporting the higher pulse rate of the two measurements pulse1 and pulse2 for each observation:

mutate(pulse, max_pulse=if_else(pulse1 < pulse2, pulse2, pulse1))
# A tibble: 110 × 14
   id     name  height weight   age gender smokes alcohol exerc…¹ ran   pulse1 pulse2
   <chr>  <chr>  <dbl>  <dbl> <dbl> <chr>  <chr>  <chr>   <chr>   <chr>  <dbl>  <dbl>
 1 1993_A Bonn…    173     57    18 female no     yes     modera… sat       86     88
 2 1993_B Mela…    179     58    19 female no     yes     modera… ran       82    150
 3 1993_C Cons…    167     62    18 female no     yes     high    ran       96    176
 4 1993_D Trav…    195     84    18 male   no     yes     high    sat       71     73
 5 1993_E Lauri    173     64    18 female no     yes     low     sat       90     88
 6 1993_F Geor…    184     74    22 male   no     yes     low     ran       78    141
 7 1993_G Cher…    162     57    20 female no     yes     modera… sat       68     72
 8 1993_H Fran…    169     55    18 female no     yes     modera… sat       71     77
 9 1993_I Sonja    164     56    19 female no     yes     high    sat       68     68
10 1993_J Troy     168     60    23 male   no     yes     modera… ran       88    150
# … with 100 more rows, 2 more variables: year <dbl>, max_pulse <dbl>, and
#   abbreviated variable name ¹​exercise

Add a logical variable adult which is true if age>=18 and false otherwise:

mutate(pulse, adult=if_else(age >= 18 , TRUE, FALSE))
# A tibble: 110 × 14
   id     name  height weight   age gender smokes alcohol exerc…¹ ran   pulse1 pulse2
   <chr>  <chr>  <dbl>  <dbl> <dbl> <chr>  <chr>  <chr>   <chr>   <chr>  <dbl>  <dbl>
 1 1993_A Bonn…    173     57    18 female no     yes     modera… sat       86     88
 2 1993_B Mela…    179     58    19 female no     yes     modera… ran       82    150
 3 1993_C Cons…    167     62    18 female no     yes     high    ran       96    176
 4 1993_D Trav…    195     84    18 male   no     yes     high    sat       71     73
 5 1993_E Lauri    173     64    18 female no     yes     low     sat       90     88
 6 1993_F Geor…    184     74    22 male   no     yes     low     ran       78    141
 7 1993_G Cher…    162     57    20 female no     yes     modera… sat       68     72
 8 1993_H Fran…    169     55    18 female no     yes     modera… sat       71     77
 9 1993_I Sonja    164     56    19 female no     yes     high    sat       68     68
10 1993_J Troy     168     60    23 male   no     yes     modera… ran       88    150
# … with 100 more rows, 2 more variables: year <dbl>, adult <lgl>, and abbreviated
#   variable name ¹​exercise

Convert gender values, female to f and male to m:

mutate(pulse, gender = if_else(gender == 'female' , 'f', 'm'))
# A tibble: 110 × 13
   id     name  height weight   age gender smokes alcohol exerc…¹ ran   pulse1 pulse2
   <chr>  <chr>  <dbl>  <dbl> <dbl> <chr>  <chr>  <chr>   <chr>   <chr>  <dbl>  <dbl>
 1 1993_A Bonn…    173     57    18 f      no     yes     modera… sat       86     88
 2 1993_B Mela…    179     58    19 f      no     yes     modera… ran       82    150
 3 1993_C Cons…    167     62    18 f      no     yes     high    ran       96    176
 4 1993_D Trav…    195     84    18 m      no     yes     high    sat       71     73
 5 1993_E Lauri    173     64    18 f      no     yes     low     sat       90     88
 6 1993_F Geor…    184     74    22 m      no     yes     low     ran       78    141
 7 1993_G Cher…    162     57    20 f      no     yes     modera… sat       68     72
 8 1993_H Fran…    169     55    18 f      no     yes     modera… sat       71     77
 9 1993_I Sonja    164     56    19 f      no     yes     high    sat       68     68
10 1993_J Troy     168     60    23 m      no     yes     modera… ran       88    150
# … with 100 more rows, 1 more variable: year <dbl>, and abbreviated variable name
#   ¹​exercise


Copyright © 2023 Biomedical Data Sciences (BDS) | LUMC